feat: implement validators, validation strategies, and RemapField transformer#533
Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR adds runtime components to support config migrations to low-code by introducing new validators, validation strategies, and a configuration transformer.
- Introduces a Validator base class and implementations (ValidateAdheresToSchema, PredicateValidator, DpathValidator) with comprehensive unit tests.
- Implements a ValidationStrategy base class and a schema-based validation strategy.
- Adds a config transformation (RemapField) that remaps field values based on a provided mapping.
Reviewed Changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| unit_tests/sources/declarative/validators/test_validate_adheres_to_schema.py | New unit tests covering JSON schema adherence validation. |
| unit_tests/sources/declarative/validators/test_predicate_validator.py | New unit tests for PredicateValidator validating both success and error scenarios. |
| unit_tests/sources/declarative/validators/test_dpath_validator.py | New unit tests for DpathValidator covering valid paths, error handling, and wildcard use. |
| unit_tests/sources/declarative/transformations/config_transformations/test_remap_field.py | New unit tests for RemapField ensuring correct field remapping behavior. |
| airbyte_cdk/sources/declarative/validators/*.py | New validator implementations and base classes supporting unified validation strategies. |
| airbyte_cdk/sources/declarative/transformations/config_transformations/*.py | New configuration transformation implementation for remapping field values. |
📝 WalkthroughWalkthroughThis change introduces a new suite of configuration transformation and validation utilities in the Airbyte CDK. It adds abstract base classes, concrete implementations for adding, removing, and remapping configuration fields, and a set of validators including schema adherence and dpath-based validation. Comprehensive unit tests for all new functionality are also included. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant ConfigTransformation
participant ConfigAddFields
participant ConfigRemoveFields
participant ConfigRemapField
User->>ConfigAddFields: transform(config)
ConfigAddFields->>ConfigAddFields: (Evaluate fields and condition)
ConfigAddFields->>ConfigAddFields: (Add fields to config)
User->>ConfigRemoveFields: transform(config)
ConfigRemoveFields->>ConfigRemoveFields: (Evaluate condition)
ConfigRemoveFields->>ConfigRemoveFields: (Remove fields by path)
User->>ConfigRemapField: transform(config)
ConfigRemapField->>ConfigRemapField: (Resolve field path)
ConfigRemapField->>ConfigRemapField: (Remap value if in map)
sequenceDiagram
participant User
participant Validator
participant DpathValidator
participant PredicateValidator
participant ValidateAdheresToSchema
participant ValidationStrategy
User->>DpathValidator: validate(input_data)
DpathValidator->>DpathValidator: (Extract value by path)
DpathValidator->>ValidationStrategy: validate(value)
User->>PredicateValidator: validate()
PredicateValidator->>ValidationStrategy: validate(value)
User->>ValidateAdheresToSchema: validate(value)
ValidateAdheresToSchema->>ValidateAdheresToSchema: (Parse value if JSON string)
ValidateAdheresToSchema->>ValidateAdheresToSchema: (Validate against schema)
Suggested reviewers
Would you like to consider involving additional reviewers familiar with the Airbyte CDK validation or transformation patterns, wdyt? Tip ⚡️ Faster reviews with caching
Enjoy the performance boost—your workflow just got faster. 📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
⏰ Context from checks skipped due to timeout of 90000ms (9)
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (16)
airbyte_cdk/sources/declarative/transformations/config_transformations/config_transformation.py (1)
9-23: Type hint mismatch between base class and implementation, wdyt?The
transformmethod's parameter type isDict[str, Any], but the implementation inremap_field.pyusesMutableMapping[str, Any]. Since the method is meant to modify the config in-place,MutableMappingseems more appropriate here. Would you consider changing this to be consistent with the implementation?- def transform( - self, - config: Dict[str, Any], - ) -> None: + def transform( + self, + config: MutableMapping[str, Any], + ) -> None:Also, you might want to add the missing import:
-from typing import Any, Dict +from typing import Any, Dict, MutableMappingairbyte_cdk/sources/declarative/transformations/config_transformations/remap_field.py (2)
23-34: Possible redundancy in field path conversion, wdyt?The initialization logic first converts all paths to
InterpolatedStringusing a list comprehension, then loops through them again to check if they're strings and convert them. Could we simplify this logic since we're already handling everything in the list comprehension?def __post_init__(self) -> None: if not self.field_path: raise Exception("field_path cannot be empty.") - self._field_path = [ - InterpolatedString.create(path, parameters={}) for path in self.field_path - ] - for path_index in range(len(self.field_path)): - if isinstance(self.field_path[path_index], str): - self._field_path[path_index] = InterpolatedString.create( - self.field_path[path_index], parameters={} - ) + self._field_path = [] + for path in self.field_path: + if isinstance(path, str): + self._field_path.append(InterpolatedString.create(path, parameters={})) + else: + self._field_path.append(path)Also, consider using a more specific exception type with a clearer error message:
if not self.field_path: - raise Exception("field_path cannot be empty.") + raise ValueError("field_path cannot be empty. Please provide at least one path component.")
35-60: Implementation seems robust!The transform logic safely navigates the nested dictionary structure and only applies transformations when appropriate. I like the early returns when paths don't exist or aren't mappings, which makes the code cleaner and prevents errors.
One suggestion - consider adding a log statement when a remapping occurs, which could be helpful for debugging:
if field_name in current and current[field_name] in self.map: + # Could add logging here if needed + # e.g., logger.debug(f"Remapping {field_name} from {current[field_name]} to {self.map[current[field_name]]}") current[field_name] = self.map[current[field_name]]airbyte_cdk/sources/declarative/validators/predicate_validator.py (1)
1-27: Simple but effective validator implementation, wdyt about a small enhancement?The
PredicateValidatorclass looks well-structured with a clear single responsibility - it stores a value and delegates validation to a provided strategy. The implementation is clean and the docstrings are clear.I notice that this class doesn't inherit from the
Validatorabstract base class, while it's placed in the validators package. This might be intentional based on the different method signatures (validate()vsvalidate(input_data: Any)), but it could be worth considering whether it should implement the common interface for consistency. Alternatively, renaming it to distinguish its different role might be helpful.Would adding type validation for the
strategyparameter in the constructor make sense to catch configuration errors earlier? Just a thought!airbyte_cdk/sources/declarative/validators/validate_adheres_to_schema.py (2)
21-31: Implementation is concise and straightforwardNice implementation of the schema validation strategy! It's clean and effectively wraps the jsonschema library while providing a good error message.
One question: should we also catch
jsonschema.SchemaErrorseparately? It happens when the schema itself is invalid (as shown in your test_given_invalid_schema_when_validate_then_raises_error test), and we might want to distinguish between invalid schema vs invalid data in the error message, wdyt?try: jsonschema.validate(instance=value, schema=self.schema) except jsonschema.ValidationError as e: raise ValueError(f"JSON schema validation error: {e.message}") +except jsonschema.SchemaError as e: + raise ValueError(f"Invalid JSON schema: {e.message}")
21-31: Suggestion: Add support for JSON string inputsI noticed that your test suite includes a test for validating JSON strings (
test_given_json_string_when_validate_then_succeeds), but the implementation doesn't explicitly handle JSON string inputs. The test is passing, which suggests jsonschema might handle this automatically, but it could be more robust to add explicit handling, wdyt?def validate(self, value: Any) -> None: """ Validates the value against the JSON schema. :param value: The value to validate :raises ValueError: If the value does not adhere to the schema """ + # If value is a JSON string, parse it first + import json + if isinstance(value, str): + try: + value = json.loads(value) + except json.JSONDecodeError as e: + raise ValueError(f"Invalid JSON string: {e}") + try: jsonschema.validate(instance=value, schema=self.schema) except jsonschema.ValidationError as e: raise ValueError(f"JSON schema validation error: {e.message}")airbyte_cdk/sources/declarative/validators/dpath_validator.py (3)
25-34: Simplify the path initializationThe current implementation of
__post_init__looks like it's doing the same conversion twice. You first createself._field_pathby converting all elements, then you iterate throughself.field_pathagain to convert any string elements. I think we can simplify this, wdyt?def __post_init__(self) -> None: - self._field_path = [ - InterpolatedString.create(path, parameters={}) for path in self.field_path - ] - for path_index in range(len(self.field_path)): - if isinstance(self.field_path[path_index], str): - self._field_path[path_index] = InterpolatedString.create( - self.field_path[path_index], parameters={} - ) + self._field_path = [ + InterpolatedString.create(path, parameters={}) for path in self.field_path + ]
47-65: Refactor to reduce code duplicationThe validation logic for wildcard and non-wildcard paths contains duplicated code. We could extract the common validation logic into a helper method to make the code more DRY, wdyt?
if "*" in path: try: values = dpath.values(input_data, path) except KeyError as e: raise KeyError(f"Error validating path '{self.field_path}': {e}") for value in values: - try: - self.strategy.validate(value) - except Exception as e: - raise ValueError(f"Error validating value '{value}': {e}") + self._validate_value(value) else: try: value = dpath.get(input_data, path) except KeyError as e: raise KeyError(f"Error validating path '{self.field_path}': {e}") - try: - self.strategy.validate(value) - except Exception as e: - raise ValueError(f"Error validating value '{value}': {e}") + self._validate_value(value) +def _validate_value(self, value: Any) -> None: + try: + self.strategy.validate(value) + except Exception as e: + raise ValueError(f"Error validating value '{value}': {e}")
35-42: Consider exposing a config parameter for path interpolationCurrently, you're passing an empty dictionary to the
evalmethod. If the field paths contain Jinja templates, they would need context to be properly evaluated. Would it make sense to accept a config parameter in the validate method and pass that to the eval calls, wdyt?-def validate(self, input_data: dict[str, Any]) -> None: +def validate(self, input_data: dict[str, Any], config: dict[str, Any] = None) -> None: """ Extracts the value at the specified path and applies the validation strategy. :param input_data: Dictionary containing the data to validate + :param config: Optional configuration for interpolating path elements :raises ValueError: If the path doesn't exist or validation fails """ - path = [path.eval({}) for path in self._field_path] + config = config or {} + path = [path.eval(config) for path in self._field_path]unit_tests/sources/declarative/validators/test_predicate_validator.py (1)
1-7: Add missing copyright headerThe copyright header is missing from this file. All other files in this PR have it, wdyt?
+# +# Copyright (c) 2025 Airbyte, Inc., all rights reserved. +# + from unittest import TestCase import pytestunit_tests/sources/declarative/validators/test_validate_adheres_to_schema.py (2)
128-130: Test could be more thorough for JSON string inputsThe test for validating JSON strings doesn't make assertions about the actual validation - it just checks that no exception is raised. Consider testing both valid and invalid JSON strings to ensure the validation is working as expected, wdyt?
def test_given_json_string_when_validate_then_succeeds(self): schema = { "type": "object", "properties": { "id": {"type": "integer"}, "name": {"type": "string"}, }, } validator = ValidateAdheresToSchema(schema=schema) validator.validate('{"id": 1, "name": "John Doe"}') + + # Test invalid JSON string + with pytest.raises(ValueError) as exc_info: + validator.validate('{"id": "not-an-integer", "name": "John Doe"}') + + assert "integer" in str(exc_info.value) + + # Test malformed JSON string + with pytest.raises(ValueError) as exc_info: + validator.validate('{bad json}') + + assert "Invalid JSON" in str(exc_info.value)
1-7: Add missing copyright headerThe copyright header is missing from this file. All other files in this PR have it, wdyt?
+# +# Copyright (c) 2025 Airbyte, Inc., all rights reserved. +# + from unittest import TestCase import jsonschema import pytestunit_tests/sources/declarative/validators/test_dpath_validator.py (4)
32-33: Could the assertion be more specific?The assertion
assert strategy.validated_valueonly checks that the value is truthy. Would it be more precise to assert the exact expected value:assert strategy.validated_value == "test@example.com"? wdyt?
91-93: Redundant assertions?There are two assertions checking the same thing, once with pytest's
assertand once with unittest'sassertIn. Would it be cleaner to stick with one assertion style for consistency? wdyt?assert strategy.validate_called assert strategy.validated_value in ["user1@example.com", "user2@example.com"] -self.assertIn(strategy.validated_value, ["user1@example.com", "user2@example.com"])Or if you prefer unittest style:
assert strategy.validate_called -assert strategy.validated_value in ["user1@example.com", "user2@example.com"] self.assertIn(strategy.validated_value, ["user1@example.com", "user2@example.com"])
23-94: Consider adding more wildcard test casesThe tests are comprehensive, but would it be valuable to add a test with multiple wildcards in the path, like
["users", "*", "addresses", "*", "city"]? This would verify that nested wildcards work correctly. Additionally, the current wildcard test only verifies that the strategy was called with one of the expected values, but not that it was called for all matching paths. wdyt?
23-94: Consider adding docstrings to test methodsThe tests are well-structured and have descriptive names, but would adding docstrings to explain the purpose and expectations of each test help future developers understand the test cases better? wdyt?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (13)
airbyte_cdk/sources/declarative/transformations/config_transformations/__init__.py(1 hunks)airbyte_cdk/sources/declarative/transformations/config_transformations/config_transformation.py(1 hunks)airbyte_cdk/sources/declarative/transformations/config_transformations/remap_field.py(1 hunks)airbyte_cdk/sources/declarative/validators/__init__.py(1 hunks)airbyte_cdk/sources/declarative/validators/dpath_validator.py(1 hunks)airbyte_cdk/sources/declarative/validators/predicate_validator.py(1 hunks)airbyte_cdk/sources/declarative/validators/validate_adheres_to_schema.py(1 hunks)airbyte_cdk/sources/declarative/validators/validation_strategy.py(1 hunks)airbyte_cdk/sources/declarative/validators/validator.py(1 hunks)unit_tests/sources/declarative/transformations/config_transformations/test_remap_field.py(1 hunks)unit_tests/sources/declarative/validators/test_dpath_validator.py(1 hunks)unit_tests/sources/declarative/validators/test_predicate_validator.py(1 hunks)unit_tests/sources/declarative/validators/test_validate_adheres_to_schema.py(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (10)
airbyte_cdk/sources/declarative/validators/validation_strategy.py (4)
airbyte_cdk/sources/declarative/validators/dpath_validator.py (1)
validate(35-65)airbyte_cdk/sources/declarative/validators/predicate_validator.py (1)
validate(20-26)airbyte_cdk/sources/declarative/validators/validate_adheres_to_schema.py (1)
validate(21-31)airbyte_cdk/sources/declarative/validators/validator.py (1)
validate(11-18)
unit_tests/sources/declarative/validators/test_predicate_validator.py (3)
airbyte_cdk/sources/declarative/validators/predicate_validator.py (1)
PredicateValidator(12-26)airbyte_cdk/sources/declarative/validators/validation_strategy.py (1)
ValidationStrategy(9-22)unit_tests/sources/declarative/validators/test_dpath_validator.py (2)
MockValidationStrategy(9-20)validate(16-20)
airbyte_cdk/sources/declarative/transformations/config_transformations/__init__.py (1)
airbyte_cdk/sources/declarative/transformations/config_transformations/remap_field.py (1)
RemapField(15-60)
airbyte_cdk/sources/declarative/validators/predicate_validator.py (1)
airbyte_cdk/sources/declarative/validators/validation_strategy.py (2)
ValidationStrategy(9-22)validate(15-22)
airbyte_cdk/sources/declarative/transformations/config_transformations/config_transformation.py (1)
airbyte_cdk/sources/declarative/transformations/config_transformations/remap_field.py (1)
transform(35-60)
airbyte_cdk/sources/declarative/validators/__init__.py (5)
airbyte_cdk/sources/declarative/validators/dpath_validator.py (1)
DpathValidator(16-65)airbyte_cdk/sources/declarative/validators/predicate_validator.py (1)
PredicateValidator(12-26)airbyte_cdk/sources/declarative/validators/validate_adheres_to_schema.py (1)
ValidateAdheresToSchema(14-31)airbyte_cdk/sources/declarative/validators/validation_strategy.py (1)
ValidationStrategy(9-22)airbyte_cdk/sources/declarative/validators/validator.py (1)
Validator(9-18)
airbyte_cdk/sources/declarative/validators/validate_adheres_to_schema.py (3)
airbyte_cdk/sources/declarative/validators/validation_strategy.py (2)
ValidationStrategy(9-22)validate(15-22)airbyte_cdk/sources/declarative/validators/predicate_validator.py (1)
validate(20-26)airbyte_cdk/sources/declarative/validators/validator.py (1)
validate(11-18)
airbyte_cdk/sources/declarative/validators/validator.py (4)
airbyte_cdk/sources/declarative/validators/dpath_validator.py (1)
validate(35-65)airbyte_cdk/sources/declarative/validators/predicate_validator.py (1)
validate(20-26)airbyte_cdk/sources/declarative/validators/validation_strategy.py (1)
validate(15-22)airbyte_cdk/sources/declarative/validators/validate_adheres_to_schema.py (1)
validate(21-31)
unit_tests/sources/declarative/transformations/config_transformations/test_remap_field.py (1)
airbyte_cdk/sources/declarative/transformations/config_transformations/remap_field.py (2)
RemapField(15-60)transform(35-60)
airbyte_cdk/sources/declarative/validators/dpath_validator.py (4)
airbyte_cdk/sources/declarative/interpolation/interpolated_string.py (1)
InterpolatedString(13-79)airbyte_cdk/sources/declarative/validators/validation_strategy.py (2)
ValidationStrategy(9-22)validate(15-22)airbyte_cdk/sources/declarative/validators/validator.py (2)
Validator(9-18)validate(11-18)unit_tests/sources/declarative/validators/test_dpath_validator.py (1)
validate(16-20)
⏰ Context from checks skipped due to timeout of 90000ms (8)
- GitHub Check: Check: 'source-pokeapi' (skip=false)
- GitHub Check: Check: 'source-amplitude' (skip=false)
- GitHub Check: Check: 'source-shopify' (skip=false)
- GitHub Check: Check: 'source-hardcoded-records' (skip=false)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (Fast)
- GitHub Check: SDM Docker Image Build
🔇 Additional comments (15)
airbyte_cdk/sources/declarative/transformations/config_transformations/__init__.py (1)
5-7: Imports and exports look good!Clean organization of imports and exports using
__all__for explicitly defining the module's public API. This makes the code more maintainable and provides clear documentation of what's available from this package.airbyte_cdk/sources/declarative/transformations/config_transformations/remap_field.py (1)
14-22: Class structure looks good!Clean dataclass implementation with well-defined attributes and clear docstring. The typing is precise and allows for both interpolated strings and raw strings in the field path.
unit_tests/sources/declarative/transformations/config_transformations/test_remap_field.py (7)
11-36: Well-structured test for valid inputs!Great test with thorough assertions including verification that the original config remains unchanged. The use of deepcopy ensures test isolation and makes the test more reliable.
37-51: Good edge case handling!I like that you're testing the case where the value isn't in the map and verifying that the configuration remains unchanged.
52-67: Good test for non-existent paths!Testing that the configuration remains unchanged when the path doesn't exist is important for ensuring the code behaves correctly in edge cases.
68-82: Great test for interpolated paths!Testing the interpolation functionality demonstrates that dynamic paths work correctly.
83-92: Good edge case for empty map!Testing with an empty map confirms that no changes are made when there's nothing to remap.
93-96: Good validation test!Testing that an empty field path raises an exception validates the input checking in the constructor.
97-112: Excellent test for sequential transformations!This test confirms that multiple transformations can be applied in sequence, which is important for validating how the transformations fit into a larger pipeline.
airbyte_cdk/sources/declarative/validators/validator.py (1)
1-19: Well-designed abstract base class, looks good!This
ValidatorABC provides a clean foundation for the validation framework. The docstring is clear and the abstract method defines a consistent contract for concrete implementations to follow.The interface is simple yet powerful - accepting any input data and raising ValueError for validation failures. This will make it straightforward to create various validation implementations.
airbyte_cdk/sources/declarative/validators/validation_strategy.py (1)
1-23: Clear strategy interface design, nicely done!The
ValidationStrategyabstract base class creates a clean separation of concerns between validators (which extract/provide data) and strategies (which perform actual validation logic). This design pattern will make it easier to mix and match different validation approaches.The method signature and docstring provide a clear contract for implementations. I particularly like that this approach allows strategies to be reused across different validators, which promotes code reuse.
airbyte_cdk/sources/declarative/validators/__init__.py (1)
1-20: Well-organized package exportsThe
__init__.pyfile appropriately collects and exports all the validator classes, providing a clean public API for the package. This makes it easy for consumers to import the validators they need directly from the package.The exports include both abstract base classes (
ValidatorandValidationStrategy) and concrete implementations, giving users access to the full validation framework.unit_tests/sources/declarative/validators/test_dpath_validator.py (3)
1-7: Clean imports with good organization!The imports are well organized, following the standard practice of stdlib, third-party, and local imports. Nice job organizing them with proper spacing!
9-21: Great mock implementation!The
MockValidationStrategyis well-designed with configurable behavior and tracking mechanisms. This makes testing much easier and provides good verification capabilities. Love the clear naming and simple implementation.
61-77: Good edge case testing!Nice job testing the empty path list and empty input data scenarios. These edge cases are important to verify the expected behavior of the validator.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
airbyte_cdk/sources/declarative/validators/validate_adheres_to_schema.py (1)
36-40: Good error handling with descriptive error messages.The exception handling here is clean and the error message includes the original validation error, which will be helpful for debugging.
One small thing - have you considered adding a check for empty schemas? An empty schema will validate anything, which might not be the intended behavior in some cases. wdyt?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
airbyte_cdk/sources/declarative/validators/validate_adheres_to_schema.py(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
airbyte_cdk/sources/declarative/validators/validate_adheres_to_schema.py (4)
airbyte_cdk/sources/declarative/validators/validation_strategy.py (2)
ValidationStrategy(9-22)validate(15-22)airbyte_cdk/sources/declarative/validators/dpath_validator.py (1)
validate(35-65)airbyte_cdk/sources/declarative/validators/predicate_validator.py (1)
validate(20-26)airbyte_cdk/sources/declarative/validators/validator.py (1)
validate(11-18)
🪛 GitHub Actions: Linters
airbyte_cdk/sources/declarative/validators/validate_adheres_to_schema.py
[error] 5-11: Ruff: Import block is un-sorted or un-formatted. Organize imports. Fixable with the --fix option.
⏰ Context from checks skipped due to timeout of 90000ms (9)
- GitHub Check: Check: 'source-pokeapi' (skip=false)
- GitHub Check: Check: 'source-amplitude' (skip=false)
- GitHub Check: Check: 'source-shopify' (skip=false)
- GitHub Check: Check: 'source-hardcoded-records' (skip=false)
- GitHub Check: Pytest (Fast)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: SDM Docker Image Build
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Analyze (python)
🔇 Additional comments (2)
airbyte_cdk/sources/declarative/validators/validate_adheres_to_schema.py (2)
14-21: The class definition looks clean and well-documented.The
@dataclassapproach works well here, and the schema type is properly defined. The class documentation clearly explains its purpose.
22-35: Nice handling of string inputs by attempting JSON parsing.I like how you handle the case where a string input might contain JSON data. The error message is clear and helpful.
One question though - should we consider validating the schema itself before using it? Sometimes invalid schemas can lead to unexpected validation results. wdyt?
Brian Lai (brianjlai)
left a comment
There was a problem hiding this comment.
Looks good and everything looks good. We talked about adding the ConfigTransformation implementations for AddFields and RemoveFields. But the rest we can add as needed during migrations. And the comment about the mapping. Will approve so I don't block you, but happy to give the final look if you want to once you add the other transformations
There was a problem hiding this comment.
Actionable comments posted: 0
🔭 Outside diff range comments (2)
unit_tests/sources/declarative/transformations/config_transformations/test_remove_fields.py (1)
1-101:⚠️ Potential issueFix import file mismatch error
The pipeline is failing with an import file mismatch error, which suggests there might be cached
.pycfiles or a naming conflict with another test file.#!/bin/bash # List all test files with similar names to identify potential conflicts find unit_tests -name "test_remove_fields*.py" # Check for __pycache__ directories that might contain conflicting .pyc files find unit_tests -name "__pycache__" -type dHave you considered cleaning the pycache directories before running the tests? Alternatively, you could rename this test file to something more unique, like
test_remove_fields_transformation.py, wdyt?🧰 Tools
🪛 GitHub Actions: PyTest Matrix
[error] 1-1: Import file mismatch error: imported module 'test_remove_fields' points to a different file path than the test file collected. Hint: remove pycache / .pyc files or use unique basenames for test modules.
unit_tests/sources/declarative/transformations/config_transformations/test_add_fields.py (1)
1-98:⚠️ Potential issueFix import file mismatch error
Similar to the other test file, there's a pipeline failure due to an import file mismatch error.
#!/bin/bash # List all test files with similar names to identify potential conflicts find unit_tests -name "test_add_fields*.py" # Check for __pycache__ directories that might contain conflicting .pyc files find unit_tests -name "__pycache__" -type dCould we rename this test file to something more unique like
test_add_fields_transformation.pyto avoid conflicts? Or maybe cleaning the pycache directories before running tests would help, wdyt?🧰 Tools
🪛 GitHub Actions: Pytest (Fast)
[error] 1-1: Import file mismatch error: imported module 'test_add_fields' points to a different file path than the test file to collect. Hint: remove pycache / .pyc files and/or use a unique basename for your test file modules.
🪛 GitHub Actions: PyTest Matrix
[error] 1-1: Import file mismatch error: imported module 'test_add_fields' points to a different file path than the test file collected. Hint: remove pycache / .pyc files or use unique basenames for test modules.
🧹 Nitpick comments (5)
airbyte_cdk/sources/declarative/transformations/config_transformations/remove_fields.py (1)
47-49: Consider validating field_pointersHave you considered adding validation to ensure that
field_pointersis not empty? This could prevent unexpected behavior if an empty list is provided.Maybe something like this would be helpful, wdyt?
def __post_init__(self) -> None: + if not self.field_pointers: + logger.warning("RemoveFields instantiated with empty field_pointers list. No fields will be removed.") self._filter_interpolator = InterpolatedBoolean(condition=self.condition, parameters={})airbyte_cdk/sources/declarative/transformations/config_transformations/add_fields.py (4)
1-4: License header looks good.The copyright year is set to 2025, which is in the future. Is this intentional or should it be the current year instead? wdyt?
67-72: Class attributes look well defined.Good use of field() for internal state management. Would it make sense to add documentation for the
conditionparameter since it's an important part of the functionality? wdyt?
73-103: Initialization logic effectively validates and processes input.The validation of field paths ensures that empty paths are rejected, which is good defensive programming. One potential improvement: would it be valuable to add validation for the path format beyond just checking length? For instance, ensuring the path elements are all strings or valid dictionary keys?
104-121: Transform implementation looks solid but could use a safeguard.The transform method correctly implements the required behavior, but there's no protection against overwriting existing fields. Would it be helpful to add an optional parameter to control whether overwriting existing fields is allowed? This could prevent accidental data loss in some scenarios.
Also, have you considered adding error handling around the dpath.new call to provide more descriptive error messages if something goes wrong? This would make debugging easier for users.
@dataclass class AddFields(ConfigTransformation): # ... existing attributes ... + allow_overwrite: bool = True # Controls whether existing fields can be overwritten # ... rest of the class ... def transform( self, config: MutableMapping[str, Any], ) -> None: # ... existing code ... for parsed_field in self._parsed_fields: valid_types = (parsed_field.value_type,) if parsed_field.value_type else None value = parsed_field.value.eval(config, valid_types=valid_types) is_empty_condition = not self.condition if is_empty_condition or self._filter_interpolator.eval( config, value=value, path=parsed_field.path ): + # Check if field exists and respect allow_overwrite setting + try: + if not self.allow_overwrite and dpath.util.search(config, parsed_field.path, yielded=True): + continue # Skip this field as it exists and overwriting is not allowed dpath.new(config, parsed_field.path, value) + except Exception as e: + raise ValueError(f"Failed to add field at path {parsed_field.path}: {str(e)}")
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
airbyte_cdk/sources/declarative/transformations/config_transformations/__init__.py(1 hunks)airbyte_cdk/sources/declarative/transformations/config_transformations/add_fields.py(1 hunks)airbyte_cdk/sources/declarative/transformations/config_transformations/remove_fields.py(1 hunks)unit_tests/sources/declarative/transformations/config_transformations/test_add_fields.py(1 hunks)unit_tests/sources/declarative/transformations/config_transformations/test_remove_fields.py(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (4)
unit_tests/sources/declarative/transformations/config_transformations/test_add_fields.py (1)
airbyte_cdk/sources/declarative/transformations/config_transformations/add_fields.py (3)
AddedFieldDefinition(19-24)AddFields(37-120)transform(104-120)
airbyte_cdk/sources/declarative/transformations/config_transformations/__init__.py (3)
airbyte_cdk/sources/declarative/transformations/config_transformations/add_fields.py (1)
AddFields(37-120)airbyte_cdk/sources/declarative/transformations/config_transformations/remap_field.py (1)
RemapField(15-60)airbyte_cdk/sources/declarative/transformations/config_transformations/remove_fields.py (1)
RemoveFields(21-72)
unit_tests/sources/declarative/transformations/config_transformations/test_remove_fields.py (1)
airbyte_cdk/sources/declarative/transformations/config_transformations/remove_fields.py (2)
RemoveFields(21-72)transform(53-72)
airbyte_cdk/sources/declarative/transformations/config_transformations/remove_fields.py (2)
airbyte_cdk/sources/declarative/interpolation/interpolated_boolean.py (1)
InterpolatedBoolean(29-66)airbyte_cdk/sources/declarative/transformations/config_transformations/config_transformation.py (1)
ConfigTransformation(9-23)
🪛 GitHub Actions: Pytest (Fast)
unit_tests/sources/declarative/transformations/config_transformations/test_add_fields.py
[error] 1-1: Import file mismatch error: imported module 'test_add_fields' points to a different file path than the test file to collect. Hint: remove pycache / .pyc files and/or use a unique basename for your test file modules.
🪛 GitHub Actions: PyTest Matrix
unit_tests/sources/declarative/transformations/config_transformations/test_add_fields.py
[error] 1-1: Import file mismatch error: imported module 'test_add_fields' points to a different file path than the test file collected. Hint: remove pycache / .pyc files or use unique basenames for test modules.
unit_tests/sources/declarative/transformations/config_transformations/test_remove_fields.py
[error] 1-1: Import file mismatch error: imported module 'test_remove_fields' points to a different file path than the test file collected. Hint: remove pycache / .pyc files or use unique basenames for test modules.
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: SDM Docker Image Build
🔇 Additional comments (21)
airbyte_cdk/sources/declarative/transformations/config_transformations/__init__.py (1)
5-9: Export classes look good!The module exports three transformation classes that handle different aspects of config manipulation - adding fields, remapping fields, and removing fields. This provides a comprehensive API for configuration transformations.
unit_tests/sources/declarative/transformations/config_transformations/test_remove_fields.py (5)
7-10: Clean import structureThe import looks good, directly importing the class under test.
12-30: Test case covers basic field removal correctlyThis test verifies that top-level fields are properly removed while preserving unrelated fields. Good assertions to check both presence and absence of fields.
33-62: Comprehensive test for nested field removalGood test coverage for removing nested fields at different levels of the hierarchy while preserving unrelated fields. The explicit assertion of the final config structure is helpful.
65-81: Great test for handling non-existent fieldsThis test verifies that the transformation gracefully handles attempts to remove fields that don't exist at various nesting levels. Nice use of
copy()to verify the config remains unchanged.
84-100: Good conditional logic testThis test effectively verifies that the transformation only removes fields when the specified condition is met. The test cases cover both true and false conditions.
unit_tests/sources/declarative/transformations/config_transformations/test_add_fields.py (7)
7-10: Clean importsGood job importing both the class under test and its supporting data class.
13-27: Test for static values looks goodThis test effectively verifies that static values are correctly added to an empty config. Good use of assertions to check the resulting structure.
30-40: Nested fields test is well-structuredThis test verifies that nested fields are correctly created, including all parent objects. The assertion clearly validates the expected nested structure.
43-58: Thorough interpolation testGood test for interpolated values that covers both reference to existing fields and expression evaluation. Nice verification that the original field remains unchanged.
61-66: Good validation testingThis test correctly verifies that invalid inputs (empty path or non-string value) raise appropriate exceptions. Using pytest.raises is the right approach here.
69-79: Field overwrite test is clearThis test confirms that existing fields are overwritten rather than preserved, which is important behavior to document and verify.
82-97: Good conditional testThis test effectively verifies the conditional logic for adding fields, testing both when the condition is true and when it's false.
airbyte_cdk/sources/declarative/transformations/config_transformations/remove_fields.py (4)
4-17: Good imports and logger setupThe imports look appropriate for the functionality being implemented. Using a dedicated logger for the airbyte namespace is a good practice.
20-49: Well-documented class definitionThe class documentation is thorough, explaining the behavior, edge cases (especially around list handling), and providing usage examples. The YAML example is particularly helpful.
50-52: Good initializationThe
__post_init__method correctly initializes theInterpolatedBooleanfor condition evaluation.
53-72: Robust transform implementationThe transform method has good error handling, gracefully handling missing paths and properly logging other exceptions. The condition check at the beginning is efficient, avoiding unnecessary work when conditions aren't met.
airbyte_cdk/sources/declarative/transformations/config_transformations/add_fields.py (4)
5-16: Import structure looks clean and well-organized.Good job organizing imports by category and using absolute imports. The types are properly imported as well.
18-25: Well-structured dataclass for field definitions.Nice use of the frozen=True attribute for immutability. The type annotations are clear and the documentation explains the purpose well.
27-34: Internal parsed field representation looks good.I like the separation between the external API (AddedFieldDefinition) and the internal representation (ParsedAddFieldDefinition). This provides clean separation of concerns.
36-66: Documentation for AddFields is thorough and helpful.Excellent docstring with clear examples demonstrating different use cases. The YAML examples are particularly helpful for users.
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (1)
airbyte_cdk/sources/declarative/transformations/config_transformations/remap_field.py (1)
20-21: Consider supporting dynamic values through InterpolatedMapping?The static mapping might be limiting for scenarios where we want to interpolate values dynamically. For example, when remapping a field to a value that incorporates parts of the existing config like
{{ config.url }}/v1/etc. A previous reviewer noted this might be useful for cases like the Amazon Seller Partner connector, wdyt?
🧹 Nitpick comments (9)
airbyte_cdk/sources/declarative/transformations/config_transformations/__init__.py (1)
5-9: Concise and well-structured exports!The import structure and
__all__declaration look good, providing a clean public API for the config transformations package.Have you considered also exporting the base
ConfigTransformationclass for users who might want to create their own custom transformations? This would make the inheritance hierarchy more discoverable, wdyt?unit_tests/sources/declarative/transformations/config_transformations/test_config_remap_field.py (1)
68-82: Nice test for interpolated pathsThe test effectively demonstrates the dynamic path interpolation capability, which is a powerful feature. Have you considered adding a test case for when the interpolated value doesn't exist or evaluates to something unexpected (like a non-string)? This would ensure robust error handling, wdyt?
unit_tests/sources/declarative/transformations/config_transformations/test_config_remove_fields.py (1)
1-99: Consider adding tests for array element removalThe tests are comprehensive for object field removal, but I noticed that according to the implementation in
remove_fields.py, the transformation also supports removing elements from arrays by setting them toNone. Would it be valuable to add a test case that verifies this behavior specifically, wdyt?For example:
def test_given_array_element_field_pointer_element_is_set_to_none(): transformation = RemoveFields( field_pointers=[["array", 1]] # Remove second element ) config = {"array": ["keep", "remove", "also_keep"]} transformation.transform(config) assert config["array"] == ["keep", None, "also_keep"]airbyte_cdk/sources/declarative/transformations/config_transformations/remap_field.py (2)
23-34: Simplified initialization logic possible?The post-init logic seems to be doing redundant work. You first create a new field_path list with all elements converted to InterpolatedString, then loop through again to check and convert any remaining strings. Perhaps this could be simplified to a single loop?
- self._field_path = [ - InterpolatedString.create(path, parameters={}) for path in self.field_path - ] - for path_index in range(len(self.field_path)): - if isinstance(self.field_path[path_index], str): - self._field_path[path_index] = InterpolatedString.create( - self.field_path[path_index], parameters={} - ) + self._field_path = [] + for path in self.field_path: + if isinstance(path, str): + self._field_path.append(InterpolatedString.create(path, parameters={})) + else: + self._field_path.append(path)
48-60: Consider adding debug or error loggingThe transform method silently returns in several cases where the path doesn't exist or intermediate values aren't mappings. Adding debug logging might help with troubleshooting when the transformation doesn't behave as expected. Also, you might want to provide a return value indicating if a transformation was actually applied?
airbyte_cdk/sources/declarative/transformations/config_transformations/remove_fields.py (1)
62-66: Consider adding debug loggingThe error handling catches PathNotFound but doesn't log any information. For debugging purposes, it might be helpful to add debug logs when a path is not found. This could help identify potential configuration issues without causing runtime failures.
try: dpath.delete(config, pointer) except dpath.exceptions.PathNotFound: + # Using logging here instead of print for proper integration with the framework's logging system + import logging + logging.debug(f"Path {pointer} not found in config during removal") passairbyte_cdk/sources/declarative/transformations/config_transformations/add_fields.py (3)
18-25: Well-structured field definition dataclassThe frozen dataclass design is a good choice for immutability of field definitions. One small suggestion - would it make sense to add a docstring explaining what value_type is for and how it's used? It's not immediately obvious from the context.
73-102: Post-initialization logic could be simplifiedThe logic for processing field values is a bit verbose with nested if-else statements. Consider refactoring to reduce nesting:
def __post_init__(self) -> None: self._filter_interpolator = InterpolatedBoolean(condition=self.condition, parameters={}) for add_field in self.fields: if len(add_field.path) < 1: raise ValueError( f"Expected a non-zero-length path for the AddFields transformation {add_field}" ) - if not isinstance(add_field.value, InterpolatedString): - if not isinstance(add_field.value, str): - raise ValueError( - f"Expected a string value for the AddFields transformation: {add_field}" - ) - else: - self._parsed_fields.append( - ParsedAddFieldDefinition( - add_field.path, - InterpolatedString.create(add_field.value, parameters={}), - value_type=add_field.value_type, - ) - ) - else: - self._parsed_fields.append( - ParsedAddFieldDefinition( - add_field.path, - add_field.value, - value_type=add_field.value_type, - ) - ) + # Handle the value based on its type + if isinstance(add_field.value, InterpolatedString): + value = add_field.value + elif isinstance(add_field.value, str): + value = InterpolatedString.create(add_field.value, parameters={}) + else: + raise ValueError( + f"Expected a string value for the AddFields transformation: {add_field}" + ) + + self._parsed_fields.append( + ParsedAddFieldDefinition( + add_field.path, + value, + value_type=add_field.value_type, + ) + )
117-119: Consider field-specific conditionsCurrently, the same condition is evaluated for all fields. Would it make sense to support field-specific conditions in the AddedFieldDefinition class? This would provide more granular control over when each field is added.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
airbyte_cdk/sources/declarative/transformations/config_transformations/__init__.py(1 hunks)airbyte_cdk/sources/declarative/transformations/config_transformations/add_fields.py(1 hunks)airbyte_cdk/sources/declarative/transformations/config_transformations/remap_field.py(1 hunks)airbyte_cdk/sources/declarative/transformations/config_transformations/remove_fields.py(1 hunks)unit_tests/sources/declarative/transformations/config_transformations/test_config_add_fields.py(1 hunks)unit_tests/sources/declarative/transformations/config_transformations/test_config_remap_field.py(1 hunks)unit_tests/sources/declarative/transformations/config_transformations/test_config_remove_fields.py(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (2)
unit_tests/sources/declarative/transformations/config_transformations/test_config_remove_fields.py (1)
airbyte_cdk/sources/declarative/transformations/config_transformations/remove_fields.py (2)
ConfigRemoveFields(18-66)transform(50-66)
airbyte_cdk/sources/declarative/transformations/config_transformations/remove_fields.py (2)
airbyte_cdk/sources/declarative/interpolation/interpolated_boolean.py (1)
InterpolatedBoolean(29-66)airbyte_cdk/sources/declarative/transformations/config_transformations/config_transformation.py (1)
ConfigTransformation(9-23)
⏰ Context from checks skipped due to timeout of 90000ms (8)
- GitHub Check: Check: 'source-pokeapi' (skip=false)
- GitHub Check: Check: 'source-shopify' (skip=false)
- GitHub Check: Check: 'source-amplitude' (skip=false)
- GitHub Check: Check: 'source-hardcoded-records' (skip=false)
- GitHub Check: Pytest (Fast)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: SDM Docker Image Build
🔇 Additional comments (23)
unit_tests/sources/declarative/transformations/config_transformations/test_config_remap_field.py (7)
6-8: Good import with convenient aliasThe alias makes the code more readable by shortening the class name in the tests. Nice approach!
11-36: Thorough test for field remapping behaviorThe test comprehensively verifies that the target field is remapped while other fields remain unchanged. Great attention to detail with the deepcopy to verify against the original state!
37-51: Proper handling of values not in mappingGood test case ensuring that fields with values not present in the mapping dictionary remain unchanged. This verifies the transformation doesn't modify fields unnecessarily.
52-67: Robust handling of missing field pathsThis test properly verifies that the configuration remains unchanged when the field path doesn't exist in the config. This ensures the transformation is non-destructive when dealing with unexpected inputs.
83-92: Empty map handling verifiedGood test ensuring that an empty mapping dictionary doesn't modify the configuration. This is important for defensive programming.
93-96: Appropriate exception validation for empty field pathThis test correctly verifies that an exception is raised when an empty field path is provided, preventing runtime errors.
97-113: Comprehensive sequence testGreat test demonstrating multiple transformations applied sequentially on different fields. This verifies that transformations compose well without interfering with each other.
unit_tests/sources/declarative/transformations/config_transformations/test_config_remove_fields.py (4)
10-29: Well-structured test for top-level field removalThe test effectively verifies that specified fields are removed while unrelated fields remain intact. Good explicit assertion of the final state with
assert config == {...}.
31-61: Thorough test for nested field removalThis test comprehensively checks that nested fields are properly removed while maintaining the structure of the configuration. The multiple assertions provide good clarity about the expected state.
63-80: Good error tolerance verificationThis test confirms that attempting to remove non-existent fields doesn't throw errors and leaves the configuration unchanged. This verifies the robustness of the transformation when handling unexpected inputs.
82-99: Effective conditional removal testThe test appropriately verifies that fields are only removed when the specified condition evaluates to true. This demonstrates the conditional application feature works correctly.
unit_tests/sources/declarative/transformations/config_transformations/test_config_add_fields.py (7)
7-12: Clean imports with useful aliasGood practice using an alias for the transformation class to make the tests more readable. The separate import for
AddedFieldDefinitionis also clear.
15-30: Clear test for adding multiple static fieldsThis test effectively verifies that multiple static fields can be added at the top level. The explicit assertion of the entire expected config is a good practice.
32-43: Good test for nested field creationThis test verifies that nested dictionaries are properly created when adding fields at deep paths. This is an important feature for working with complex configurations.
45-61: Comprehensive interpolation testGreat test that verifies both string interpolation (referencing existing config values) and expression evaluation. It demonstrates the flexibility of the transformation for dynamic value generation.
63-69: Proper validation testsGood tests ensuring that invalid inputs (empty path or non-string value) raise appropriate exceptions. This helps prevent runtime errors from invalid configurations.
71-82: Overwrite behavior verificationThis test confirms that existing fields are properly overwritten when a new value is added at the same path. This is important for understanding the precedence behavior of the transformation.
84-100: Effective conditional addition testThis test correctly verifies that fields are only added when the specified condition evaluates to true. This demonstrates the conditional application feature works properly.
airbyte_cdk/sources/declarative/transformations/config_transformations/remap_field.py (1)
35-61: Ensure map values are type-compatible with field valuesThere's currently no validation that the mapped value has the same type as the original field value, which could lead to unexpected behavior downstream. Would it make sense to add optional type checking or at least a warning if types differ significantly?
airbyte_cdk/sources/declarative/transformations/config_transformations/remove_fields.py (2)
44-46: Should empty field_pointers be validated?There's validation for empty field paths in the AddFields transformation, but no validation that field_pointers itself is not empty. Is this intentional? Having an empty list of field_pointers would make this transformation a no-op.
1-67: Overall implementation looks clean and well-documented!The use of dpath for handling nested fields is a good choice, and the conditional application of the transformation is well implemented. The detailed docstring explaining special handling of list elements is particularly helpful.
airbyte_cdk/sources/declarative/transformations/config_transformations/add_fields.py (2)
114-120: Consider handling overwrite behavior explicitlyThe current implementation uses dpath.new without specifying how to handle the case where the field already exists. You might want to provide control over whether existing fields should be overwritten or preserved. wdyt?
if is_empty_condition or self._filter_interpolator.eval( config, value=value, path=parsed_field.path ): - dpath.new(config, parsed_field.path, value) + # Allow explicit control over overwriting behavior + dpath.new(config, parsed_field.path, value, overwrite=True) # or maybe make this configurable
1-121: Well-structured implementation with good documentationThe overall design is clean with appropriate use of helper classes and good documentation including YAML examples. The type validation for values is a nice touch for ensuring data integrity.
There was a problem hiding this comment.
Actionable comments posted: 0
🔭 Outside diff range comments (1)
unit_tests/sources/declarative/transformations/config_transformations/test_config_remap_field.py (1)
1-129: 💡 Verification agent🧩 Analysis chain
Format the file to fix linter errors.
The pipeline failures indicate a Ruff format issue. Running
ruff formaton this file should resolve the formatting issues.
🏁 Script executed:
#!/bin/bash # Check which formatting issues Ruff would fix ruff format --diff unit_tests/sources/declarative/transformations/config_transformations/test_config_remap_field.pyLength of output: 2159
Format
test_config_remap_field.pywithruffto fix trailing commas, blank lines, and line wrappingTo resolve the linter errors, you can run:
ruff format -i unit_tests/sources/declarative/transformations/config_transformations/test_config_remap_field.pyOr manually apply the following changes:
• Add a trailing comma after
config=config
• Remove the extra blank line beforemapping = {...}
• Wrap the two longRemapField(...).transform(...)calls in parentheses--- a/unit_tests/sources/declarative/transformations/config_transformations/test_config_remap_field.py +++ b/unit_tests/sources/declarative/transformations/config_transformations/test_config_remap_field.py @@ -20,7 +20,7 @@ remap_transform = RemapField( field_path=["authorization", "auth_type"], map={"client_credentials": "oauth2", "api_key": "key_auth"}, - config=config + config=config, ) @@ -113,7 +113,6 @@ def test_amazon_seller_partner_marketplace_remap_with_interpolated_mapping(self): - mapping = { "endpoint": { "ES": "{{ 'https://sellingpartnerapi' if config.environment == 'production' else 'https://sandbox.sellingpartnerapi' }}-eu.amazon.com", @@ -121,8 +120,12 @@ production_config = {"environment": "production", "marketplace": "ES"} - RemapField(field_path=["marketplace"], map=mapping["endpoint"], config=sandbox_config).transform(sandbox_config) - RemapField(field_path=["marketplace"], map=mapping["endpoint"], config=production_config).transform(production_config) + RemapField( + field_path=["marketplace"], map=mapping["endpoint"], config=sandbox_config + ).transform(sandbox_config) + RemapField( + field_path=["marketplace"], map=mapping["endpoint"], config=production_config + ).transform(production_config)Wdyt?
🧰 Tools
🪛 GitHub Actions: Linters
[error] 1-1: Ruff format check failed. File would be reformatted. Run 'ruff format' to fix code style issues.
🧹 Nitpick comments (11)
airbyte_cdk/sources/declarative/transformations/config_transformations/add_fields.py (5)
1-3: Check the copyright year.The copyright notice lists 2025, which is a future date. Should this be updated to the current year?
- # Copyright (c) 2025 Airbyte, Inc., all rights reserved. + # Copyright (c) 2023 Airbyte, Inc., all rights reserved.wdyt?
73-103: Consider handling unexpected value types more gracefully.The validation logic correctly handles InterpolatedString and str types, but what happens with other types? Currently, it would raise a ValueError, but the error message only mentions expecting a string value, which might be confusing.
if not isinstance(add_field.value, InterpolatedString): if not isinstance(add_field.value, str): raise ValueError( - f"Expected a string value for the AddFields transformation: {add_field}" + f"Expected a string or InterpolatedString value for the AddFields transformation, but got {type(add_field.value).__name__}: {add_field}" )wdyt?
104-120: Document overwriting behavior and consider exception handling.The transform method looks solid! A few things to consider:
dpath.newwill overwrite existing values at the specified path. This behavior isn't explicitly documented.- There's no exception handling for the dpath.new call, which could fail if there are issues with the path or value.
Consider updating the method docstring to mention the overwrite behavior and adding try-except handling:
def transform( self, config: MutableMapping[str, Any], ) -> None: """ Transforms a config by adding fields based on the provided field definitions. + If a field already exists at the specified path, it will be overwritten. :param config: The user-provided configuration to be transformed """ for parsed_field in self._parsed_fields: valid_types = (parsed_field.value_type,) if parsed_field.value_type else None value = parsed_field.value.eval(config, valid_types=valid_types) if not self.condition or self._filter_interpolator.eval( config, value=value, path=parsed_field.path ): + try: dpath.new(config, parsed_field.path, value) + except Exception as e: + raise ValueError(f"Failed to add field at path {parsed_field.path}: {str(e)}")wdyt?
67-71: Document the condition parameter and its context variables.The
conditionparameter enables conditional field addition, which is powerful! However, it would be helpful to document thatvalueandpathare available as variables in the condition evaluation, as seen in thetransformmethod.Consider enhancing the docstring to clarify this:
fields: List[AddedFieldDefinition] + # A boolean expression that determines whether a field should be added. + # The expression can reference the config being transformed, as well as + # `value` (the evaluated field value) and `path` (the field path). condition: str = "" _parsed_fields: List[ParsedAddFieldDefinition] = field( init=False, repr=False, default_factory=list )wdyt?
36-66: Excellent documentation with helpful YAML examples!The class documentation is thorough and the YAML examples make it easy to understand how to use this transformation. Consider also adding an example of using the conditional feature since it's a powerful capability.
# by supplying any valid Jinja template directive or expression - path: ["two_times_two"] value: "{{ 2 * 2 }}" + + # conditional field addition + - path: ["optional_field"] + value: "{{ config.source_value }}" + condition: "{{ value != 'skip_me' }}"wdyt?
airbyte_cdk/sources/declarative/transformations/config_transformations/remap_field.py (5)
2-2: Copyright year appears to be set to the future.The copyright year is set to 2025, but we're currently in 2024. Should we update the year to the current one? wdyt?
- # Copyright (c) 2025 Airbyte, Inc., all rights reserved. + # Copyright (c) 2024 Airbyte, Inc., all rights reserved.
19-20: Should docstring mention interpolated mapping support?The docstring describes this as a "transformation that remaps a field's value based on a static map", but I can see from the implementation and tests that it also supports interpolated values in the map. Might be helpful to clarify this in the docstring to make the capability more discoverable. wdyt?
- Transformation that remaps a field's value to another value based on a static map. + Transformation that remaps a field's value to another value based on a map. + The map values can include interpolated strings that reference config values.
29-36: Simplify field_path conversion logic.The current implementation creates
_field_pathand then potentially overwrites elements in a separate loop. Perhaps we could simplify this to a single operation? wdyt?- self._field_path = [ - InterpolatedString.create(path, parameters={}) for path in self.field_path - ] - for path_index in range(len(self.field_path)): - if isinstance(self.field_path[path_index], str): - self._field_path[path_index] = InterpolatedString.create( - self.field_path[path_index], parameters={} - ) + self._field_path = [ + path if isinstance(path, InterpolatedString) else InterpolatedString.create(path, parameters={}) + for path in self.field_path + ]
27-28: Consider using a more specific exception type.The code raises a generic Exception when the field_path is empty. Using a more specific exception type like ValueError could make error handling more consistent across the codebase. wdyt?
- raise Exception("field_path cannot be empty.") + raise ValueError("field_path cannot be empty.")
44-47: Docstring should mention early return behavior.The transform method silently returns without making changes if the field path doesn't exist or if intermediate nodes aren't mappings. This is fine, but should be documented in the docstring for clarity. wdyt?
Transforms a config by remapping a field value based on the provided map. If the original value is found in the map, it's replaced with the mapped value. If the value is not in the map, the field remains unchanged. +If any part of the field path doesn't exist or isn't a mapping, no changes are made.unit_tests/sources/declarative/transformations/config_transformations/test_config_remap_field.py (1)
94-97: Test could be more specific about exception.The test verifies that an exception is raised with an empty field path, but doesn't check the exception message. Would it be helpful to be more specific about the expected exception type and message? wdyt?
- with pytest.raises(Exception): + with pytest.raises(Exception, match="field_path cannot be empty."):
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
airbyte_cdk/sources/declarative/transformations/config_transformations/add_fields.py(1 hunks)airbyte_cdk/sources/declarative/transformations/config_transformations/remap_field.py(1 hunks)unit_tests/sources/declarative/transformations/config_transformations/test_config_remap_field.py(1 hunks)
🧰 Additional context used
🪛 GitHub Actions: Linters
unit_tests/sources/declarative/transformations/config_transformations/test_config_remap_field.py
[error] 1-1: Ruff format check failed. File would be reformatted. Run 'ruff format' to fix code style issues.
⏰ Context from checks skipped due to timeout of 90000ms (9)
- GitHub Check: Check: 'source-pokeapi' (skip=false)
- GitHub Check: Check: 'source-amplitude' (skip=false)
- GitHub Check: Check: 'source-hardcoded-records' (skip=false)
- GitHub Check: Check: 'source-shopify' (skip=false)
- GitHub Check: SDM Docker Image Build
- GitHub Check: Pytest (Fast)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Analyze (python)
🔇 Additional comments (9)
airbyte_cdk/sources/declarative/transformations/config_transformations/add_fields.py (1)
18-25: The class design looks great!Good job on defining a clear, immutable data structure for field additions with appropriate type annotations.
unit_tests/sources/declarative/transformations/config_transformations/test_config_remap_field.py (8)
1-10: The imports look good!The necessary imports for testing are present, and the style aligns with the rest of the codebase.
🧰 Tools
🪛 GitHub Actions: Linters
[error] 1-1: Ruff format check failed. File would be reformatted. Run 'ruff format' to fix code style issues.
12-37: Comprehensive basic test case.This test thoroughly verifies the basic field remapping functionality, including checking that only the intended field is modified while others remain unchanged. Good job!
38-52: Good edge case testing.Testing the behavior when a value isn't present in the mapping is important. This ensures the transformer doesn't modify fields unexpectedly.
53-68: Good test for non-existent field paths.This test ensures that when the field path doesn't exist in the config, no changes are made to the config. This is important for ensuring the transformer behaves correctly in all scenarios.
69-83: Good test for interpolated path support.This test verifies that the transformer correctly handles interpolated field paths, which is an important feature for dynamic configurations.
84-93: Good edge case testing for empty map.Testing with an empty map is important to ensure the transformer handles this edge case correctly without modifying the config.
98-114: Good test for sequential transformations.This test ensures that multiple transformations can be applied in sequence, which is important for complex configuration scenarios.
115-129: Excellent real-world test case.This test with the Amazon Seller Partner use case validates that the transformer handles complex interpolated mappings with conditional logic. This is exactly the scenario mentioned in the previous review comment.
Brian Lai (brianjlai)
left a comment
There was a problem hiding this comment.
just a few small things, but this looks ready to go. nice work! ✅
let me know if my explanation for the runtime vs. parse time eval of the mapping makes sense too since its a somewhat subtle way that we think about interpolation processing
What
How
Validatorbase classDpathValidatorandPredicateValidatorvalidator classesValidationStrategybase classValidateAdheresToSchemavalidation strategy classConfigTransformationbase classRemapFieldconfig transformation classRecommended Review Order
validator.pydpath_validator.pyand unit testspredicate_validator.pyand unit testsvalidation_strategy.pyvalidate_adheres_to_schemaand unit testsconfig_transformation.pyremap_field.pyand unit testsSummary by CodeRabbit